Getting ready

Load packages

library(here) # [CRAN] A replacement for 'file.path()', locating the files relative to the project root
library(tidyverse) # [CRAN] Easily install and load the 'Tidyverse'   
library(RColorBrewer) # [CRAN] ColorBrewer Palettes
library(cowplot)  # [CRAN] Streamlined plot theme and plot annotations for 'ggplot2' 
library(plotly) # [CRAN] Create interactive web graphics via 'plotly.js'
library(phyloseq) # [Bioconductor] Handling and analysis of high-throughput microbiome census data 
library(philr) # [Bioconductor] Phylogenetic partitioning based ILR transform for metagenomics data
library(ape) # [CRAN] Analyses of phylogenetics and evolution
library(vegan) # [CRAN] Community ecology package 
library(gt) # [github::rstudio/gt] # Easily Create Presentation-Ready Display Tables

# Set seed
set.seed(1910)

Load data

load(here("data", "qiime2R", "phyloseq.RData"))
load(here("data", "qiime2R", "beta-diversity_distance.RData"))
load(here("data", "qiime2R", "beta-diversity_ordination.RData"))

# Extract metadata from the phyloseq object
metadata <- data.frame(sample_data(phyloseq), check.names = FALSE)

Ordination

As beta-diversity metrics are sensitive to differences in the sequencing depth of samples, a common way to mitagete this problem is to rarefy the feature table, i.e., downsample count data to the lowest count in the dataset. However, rarefying results in loss of sequence data and throws away samples with extremely low sequence counts. Alternative approaches for normaizing library sizes have been proposed (McMurdie et al., 2014), such as DESeq2’s variance stabilization technique (Love et al., 2014) and metagenomeSeq’s CSS (Paulson et al., 2013)). While these alternative methods showed promising results, they are not reccomended for presence/absence metrics, such as Jaccard or unweighted-UniFrac, as the resulting ordination may be distorted. For the presence/absence metrics, rarefying is the best solution at present. (Weiss et al., 2017)).

Another apporach to get around the issue of differential sequencing depth is to analyze the microbiome sequence data using compositional data analysis approaches. Microbiome sequence data are compositional, i.e., they are represented by relative abundances or proportions which individually carry no meaning for the absolute abundance of a specific feature (Aitchison, 1986). Applying standard statistical tools, such as t-test, Wilcoxon rank-sum test, ANOVA and Pearson correlation coefficient, directly to compositional data produces spurious resutls. The first step of compositional data analysis is to convert the relative abundances of each part, or the values in the table of counts for each part, to ratios between all parts, such that existing statistical methods may be applied without introducing spurious conclusions (Gloor et al., 2017). Commonly used data transformation include centered log-ratio transformation (CLR) and isometric log-ration transformation (ISL), during which the differences in the library sizes are cancelled out. This means that all the samples and sequence data are used without having to fit a distribution model to the data. Recently developed methods for beta-diversity analysis that take into account the compositional nature of the microbiome sequence data include PhILR and roboust Aitchison PCA.

In this section, both unweighted (presence/absence) and weighted metrics, computed from rarefied and unrarefied feature table respectively, will be used for ordination.

PCoA of Jaccard distance

Extract principal coordinates and variance explained for plotting.

pco_jac <- ord_jac$data$Vectors %>%
  full_join(., rownames_to_column(metadata, "SampleID"), by = "SampleID")

labs_jac <- paste0("PCo", 1:ncol(ord_jac$data$ProportionExplained), ": ", 
                   round(100*ord_jac$data$ProportionExplained, 1), "%")

2D plot

p_jac <- ggplot(pco_jac, aes(x = PC1, y = PC2, color = SampleType)) +
  geom_hline(yintercept = 0) +
  geom_vline(xintercept = 0) +
  geom_point(size = 4) +
  labs(title = "PCoA of Jaccard distance", color = "Sample type",
       x = labs_jac[1],
       y = labs_jac[2]) +
  scale_color_manual(values = brewer.pal(n = 12, name = "Paired")[c(2,1,8,7)]) +
  theme_cowplot() +
  panel_border(colour = "black")

p_jac

Interactive 3D plot

plot_ly(x = pco_jac[,"PC1"], y = pco_jac[,"PC2"], z = pco_jac[,"PC3"], 
        type = "scatter3d", mode = "markers", color = metadata$SampleType, 
        colors = brewer.pal(n = 12, name = "Paired")[c(2,1,8,7)]) %>%
        layout(scene = list(xaxis = list(title = labs_jac[1]),
                            yaxis = list(title = labs_jac[2]),
                            zaxis = list(title = labs_jac[3])
                        ))

PCoA of unweighted-UniFrac distance

Extract principal coordinates and variance explained for plotting.

pco_uwuf <- ord_uwuf$data$Vectors %>%
  full_join(., rownames_to_column(metadata, "SampleID"), by = "SampleID")

labs_uwuf <- paste0("PCo", 1:ncol(ord_uwuf$data$ProportionExplained), ": ", 
                   round(100*ord_uwuf$data$ProportionExplained, 1), "%")

2D plot

p_uwuf <- ggplot(pco_uwuf, aes(x = PC1, y = PC2, color = SampleType)) +
  geom_hline(yintercept = 0) +
  geom_vline(xintercept = 0) +
  geom_point(size = 4) +
  labs(title = "PCoA of unweighted-UniFrac distance", color = "Sample type",
       x = labs_uwuf[1],
       y = labs_uwuf[2]) +
  scale_color_manual(values = brewer.pal(n = 12, name = "Paired")[c(2,1,8,7)]) +
  theme_cowplot() +
  panel_border(colour = "black")

p_uwuf

Interactive 3D plot

plot_ly(x = pco_uwuf[,"PC1"], y = pco_uwuf[,"PC2"], z = pco_uwuf[,"PC3"], 
        type = "scatter3d", mode = "markers", color = metadata$SampleType, 
        colors = brewer.pal(n = 12, name = "Paired")[c(2,1,8,7)]) %>%
        layout(scene = list(xaxis = list(title = labs_uwuf[1]),
                            yaxis = list(title = labs_uwuf[2]),
                            zaxis = list(title = labs_uwuf[3])
                        ))

Roboust Aitchison PCA

Roboust Aitchison PCA (RPCA) is a compositional beta diversity metric rooted in a centered log-ratio transformation and matrix completion (Martino et al., 2019). Aitchison distance was used as the distance metric in the roboust Aitchison PCA for its desirable properties: 1)scale invariant, which ensures equivalence between distances computed from absolute and relative abundance measurements, negating the need to perform rarefaction; 2)relative changes driven. Microbes that display large fold change across samples will be weighted more heavily, which makes the ordination roboust to random fluctuations of high-abundant taxa; 3)subcompositionally coherent, which guarantees that distances will never decrease if additional taxa are observed. HOwever, Aitchison distance cannot handle zeros and is thus challenging to apply to the sparse microbiome data. To circumvent this issue, RPCA treats all zeros as missing values and builds a model to handle this missing data using matrix completion.

Note that RPCA is not exactly performing PCA. It is performing PCoA using the Aitchison distance, which is calculated from the Euclidean distance of the clr-transformed data. Since PCoA with Euclidean distance is equivalent to PCA, the method is called PCA though it’s in fact running PCoA.

Extract principal coordinates and variance explained for plotting.

pco_aitchison <- ord_aitchison$data$Vectors %>%
  full_join(., rownames_to_column(metadata, "SampleID"), by = "SampleID")

labs_aitchison <- paste0("PCo", 1:ncol(ord_aitchison$data$ProportionExplained), ": ", 
                         round(100*ord_aitchison$data$ProportionExplained, 1), "%")

2D plot

p_aitchison <- ggplot(pco_aitchison, aes(x = PC1, y = PC2, color = SampleType)) +
  geom_hline(yintercept = 0) +
  geom_vline(xintercept = 0) +
  geom_point(size = 4) +
  labs(title = "Robust Aitchison PCA", color = "Sample type",
       x = labs_aitchison[1],
       y = labs_aitchison[2]) +
  scale_color_manual(values = brewer.pal(n = 12, name = "Paired")[c(2,1,8,7)]) +
  theme_cowplot() +
  panel_border(colour = "black")

p_aitchison

Interactive 3D plot

plot_ly(x = pco_aitchison[,"PC1"], y = pco_aitchison[,"PC2"], z = pco_aitchison[,"PC3"], 
        type = "scatter3d", mode = "markers", color = metadata$SampleType, 
        colors = brewer.pal(n = 12, name = "Paired")[c(2,1,8,7)]) %>%
        layout(scene = list(xaxis = list(title = labs_aitchison[1]),
                            yaxis = list(title = labs_aitchison[2]),
                            zaxis = list(title = labs_aitchison[3])
                        ))

PCoA of Euclidean distance calculated on PhILR transformed data

PhILR is short for “Phylogenetic Isometric Log-Ratio Transform”. The goal of PhILR is to transform compositional data into an orthogonal unconstrained space (real space) with phylogenetic / evolutionary interpretation while preserving all information contained in the original composition. For a given set of samples consisting of measurements of taxa, we transform data into a new space of samples and orthonormal coordinates termed ‘balances’. Each balance is associated with a single internal node of a phylogenetic tree with the taxa as leaves. The balance represents the log-ratio of the geometric mean abundance of the two groups of taxa that descend from the given internal node. More details on this method can be found in the paper “A phylogenetic transform enhances analysis of compositional microbiota data”.

Filter and transform the feature table

In the original paper and PhILR pakcage tutorial, taxa that were not seen with more than 3 counts in at least 20% of samples or with a coefficient of variation ≤ 3 were filtered. For the present data set, we’ll not do data filtering as it results in great loss of data. We’ll just add a pseudocount of 1 to the feature table to avoid calculating log-ratios involving zeros.

#phyloseq <- filter_taxa(phyloseq, function(x) sum(x > 3) > (0.2 * length(x)), TRUE)
#phyloseq <- filter_taxa(phyloseq, function(x) sd(x)/mean(x) > 3.0, TRUE)
phyloseq <- transform_sample_counts(phyloseq, function(x) x + 1)

Process phylogenetic tree

Next we check that the tree is rooted and binary (all multichotomies have been resolved).

is.rooted(phy_tree(phyloseq)) # Is the tree Rooted?
## [1] TRUE
is.binary.tree(phy_tree(phyloseq)) # All multichotomies resolved?
## [1] TRUE

As the tree is not binary, we use the function multi2di from the ape package to replace multichotomies with a series of dichotomies with one (or several) branch(es) of zero length.

phy_tree(phyloseq) <- multi2di(phy_tree(phyloseq)) 
is.binary.tree(phy_tree(phyloseq)) 
## [1] TRUE

Now we name the internal nodes of the tree so they are easier to work with. We prefix the node number with n and thus the root is named n1.

phy_tree(phyloseq) <- makeNodeLabel(phy_tree(phyloseq), method = "number", prefix = 'n')

We note that the tree is already rooted with Bacteria as the outgroup and no multichotomies are present. This uses the function name.balance from the philr package. This function uses a simple voting scheme to find a consensus naming for the two clades that descend from a given balance. Specifically for a balance named x/y, x refers to the consensus name of the clade in the numerator of the log-ratio and y refers to the denominator.

name.balance(phy_tree(phyloseq), tax_table(phyloseq), 'n1')
## [1] "Order_o__Betaproteobacteriales/Kingdom_k__Bacteria"

Investigate dataset components

Finally we transpose the ASV table (philr uses the conventions of the compositions package for compositional data analysis in R, taxa are columns, samples are rows). Then we will take a look at part of the dataset in more detail.

table_philr <- t(otu_table(phyloseq))
table_philr[1:2,1:2] 
## OTU Table:          [2 taxa and 2 samples]
##                      taxa are columns
##          ATTGAACGCTGGCGGCATGCCTTACACATGCAAGTCGAACGGCAGCGGGCCCTTCGGGGTGCCGGCGAGTGGCGAACGGGTGAGTAATGCATCGGAACGTACCCGGGAGAGGGGGATAACCGGCCGAAAGGCTGGCTAATACCGCATGAACTCGGAAGAGCAAAGCGGGGGACCTTCGGGCCTCGCGCTCTCGGAGCGGCCGATGTCCGATTAGGCTAGTTGGTGGGGTAAAGGCCCACCAAGGCGACGATCGGTAGCGGGTCTGAGAGGACGATCCGCCACACTGGGACTGAGACACGGCCCAG
## AqFl2-01                                                                                                                                                                                                                                                                                                                 1
## AqFl2-02                                                                                                                                                                                                                                                                                                                 1
##          ATTGAACGCTGGCGGCATGCCTTACACATGCAAGTCGAACGGCAGCGGGGACTTCGGTCCGCCGGCGAGTGGCGAACGGGTGAGTAAAGTATCGGAACGTACCTTTCAGTGGGGGATAACGTAGCGAAAGTTACGCTAATACCGCATATTCTGTGAGCAGGAAAGCAGGGGATCGCAAGACCTTGCGCTGATTGAGCGGCCGATATCAGATTAGCTAGTTGGTGGGGTAAAGGCCTACCAAGGCGACGATCTGTAGCGGGTCTGAGAGGATGATCCGCCACACTGGAACTGAGACACGGTCCAG
## AqFl2-01                                                                                                                                                                                                                                                                                                               31
## AqFl2-02                                                                                                                                                                                                                                                                                                                1
tree <- phy_tree(phyloseq)
tree 
## 
## Phylogenetic tree with 1615 tips and 1614 internal nodes.
## 
## Tip labels:
##   ATTGAACGCTGGCGGCATGCCTTACACATGCAAGTCGAACGGCAGCGGGCCCTTCGGGGTGCCGGCGAGTGGCGAACGGGTGAGTAATGCATCGGAACGTACCCGGGAGAGGGGGATAACCGGCCGAAAGGCTGGCTAATACCGCATGAACTCGGAAGAGCAAAGCGGGGGACCTTCGGGCCTCGCGCTCTCGGAGCGGCCGATGTCCGATTAGGCTAGTTGGTGGGGTAAAGGCCCACCAAGGCGACGATCGGTAGCGGGTCTGAGAGGACGATCCGCCACACTGGGACTGAGACACGGCCCAG, ATTGAACGCTGGCGGCATGCCTTACACATGCAAGTCGAACGGCAGCGGGGACTTCGGTCCGCCGGCGAGTGGCGAACGGGTGAGTAAAGTATCGGAACGTACCTTTCAGTGGGGGATAACGTAGCGAAAGTTACGCTAATACCGCATATTCTGTGAGCAGGAAAGCAGGGGATCGCAAGACCTTGCGCTGATTGAGCGGCCGATATCAGATTAGCTAGTTGGTGGGGTAAAGGCCTACCAAGGCGACGATCTGTAGCGGGTCTGAGAGGATGATCCGCCACACTGGAACTGAGACACGGTCCAG, ATTGAACGCTGGCGGCATGCCTTACACATGCAAGTCGAACGGCAGCACGGGTGCTTGCACCTGGTGGCGAGTGGCGAACGGGTGAGTAATACATCGGAACATGTCCTGTAGTGGGGGATAGCCCGGCGAAAGCCGGATTAATACCGCATACGATCTACGGATGAAAGCGGGGGACCTTCGGGCCTCGCGCTATAGGGTTGGCCGATGGCTGATTAGCTAGTTGGTGGGGTAAAGGCCTACCAAGGCGACGATCAGTAGCTGGTCTGAGAGGACGACCAGCCACACTGGGACTGAGACACGGCCCAG, ATTGAACGCTGGCGGCATGCCTTACACATGCAAGTCGAACGGCAGCACGGGTGCTTGCACCTGGTGGCGAGTGGCGAACGGGTGAGTAATACATCGGAACATGTCCTGTAGTGGGGGATAGCCCGGCGAAAGCCGGATTAATACCGCATACGATCCACGGATGAAAGCGGGGGACCTTCGGGCCTCGCGCTATAGGGTTGGCCGATGGCTGATTAGCTAGTTGGTGGGGTAAAGGCCTACCAAGGCGACGATCAGTAGCTGGTCTGAGAGGACGACCAGCCACACTGGGACTGAGACACGGCCCAG, ATTGAACGCTGGCGGCATGCTTTACACATGCAAGTCGAACGGCAGCGCGGGGCAACCTGGCGGCGAGTGGCGAACGGGTGAGTAATATATCGGAACGTACCCAAGAGTGGGGGATAACGTAGCGAAAGTTACGCTAATACCGCATACGATCTAAGGATGAAAGTGGGGGATTCGCAAGAACCTCATGCTCTTGGAGCGGCCGATATCTGATTAGCTAGTTGGTGAGGTAAAGGCTCACCAAGGCGACGATCAGTAGCTGGTCTGAGAGGACGACCAGCCACACTGGAACTGAGACACGGTCCAG, ATTGAACGCTGGCGGCATGCCTTACACATGCAAGTCGAACGGCAGCACGGGGCAACCTGGTGGCGAGTGGCGAACGGGTGAGTAATATATCGGAACGTACCCTGGAGTGGGGGATAACGTAGCGAAAGTTACGCTAATACCGCATACGATCTAAGGATGAAAGTGGGGGATTCGCAAGAACCTCATGCTCCTGGAGCGGCCGATATCTGATTAGCTAGTTGGTGGGGTAAAGGCCCACCAAGGCTTCGATCAGTAGCTGGTCTGAGAGGACGACCAGCCACACTGGAACTGAGACACGGTCCAG, ...
## Node labels:
##   n1, n2, n3, n4, n5, n6, ...
## 
## Rooted; includes branch lengths.

Transform data using PhILR

The function philr::philr() implements a user friendly wrapper for the key steps in the philr transform.

  1. Convert the phylogenetic tree to its sequential binary partition (SBP) representation using the function philr::phylo2sbp()
  2. Calculate the weighting of the taxa (aka parts) or use the user specified weights
  3. Built the contrast matrix from the SBP and taxa weights using the function philr::buildilrBasep()
  4. Convert ASV table to relative abundance (using philr::miniclo()) and ‘shift’ dataset using the weightings via the function philr::shiftp().
  5. Transform the data to PhILR space using the function philr::ilrp()
  6. (Optional) Weight the resulting PhILR space using phylogenetic distance. These weights are either provided by the user or can be calculated by the function philr::calculate.blw().

Note: The preprocessed ASV table should be passed to the function philr::philr() before it is closed (normalized) to relative abundances, as some of the preset weightings of the taxa use the original count data to down weight low abundance taxa.

Here we will use the same weightings as used in the original paper.

philr <- philr(table_philr, tree, part.weights = 'enorm.x.gm.counts', ilr.weights = 'blw.sqrt')
philr[1:5,1:5]
##                  n1         n2         n3          n4         n5
## AqFl2-01 -1.1126775 -0.1697501 -0.6077461  0.61417110 -0.7918057
## AqFl2-02 -0.2726068  0.3901253 -0.2365762  0.09100653  0.1440859
## AqFl2-03  0.2557023  0.3901253 -0.2365762  0.09100653  0.1440859
## AqFl2-04 -0.2836844  0.9569463  0.5855174  0.51055637 -0.9430634
## AqFl2-05 -0.4334598  0.8979867  0.5000048 -0.21598914  0.1440859

Now the transformed data is represented in terms of balances and since each balance is associated with a single internal node of the tree, we denote the balances using the same names we assigned to the internal nodes (e.g., n1).

Ordination in PhILR space

Euclidean distance in PhILR space can be used for ordination analysis. First we compute the Euclidean distance and run PCoA using the ordinate() function from the phyloseq package.

# Compute Euclidean distance on PhILR transformed data
dist_philr <- dist(philr, method = "euclidean")

# Ordination by PCoA
ord_philr <- ordinate(phyloseq, 'PCoA', distance = dist_philr)

Extract principal coordinates and variance explained for plotting.

pco_philr <- as.data.frame(ord_philr$vectors) %>%
  rownames_to_column("SampleID") %>%
  full_join(., rownames_to_column(metadata, "SampleID"), by = "SampleID") 

labs_philr <- paste0("PCo", 1:length(ord_philr$values$Eigenvalues), ": ", 
                     round((100*ord_philr$values$Eigenvalues/sum(ord_philr$values$Eigenvalues)),1), "%")

2D plot

p_philr <- ggplot(pco_philr, aes(x = Axis.1, y = Axis.2, color = SampleType)) +
  geom_hline(yintercept = 0) +
  geom_vline(xintercept = 0) +
  geom_point(size = 4) +
  labs(title = "PCoA of PhILR transformed data", color = "Sample type",
       x = labs_philr[1],
       y = labs_philr[2]) +
  scale_color_manual(values = brewer.pal(n = 12, name = "Paired")[c(2,1,8,7)]) +
  theme_cowplot() +
  panel_border(colour = "black") 

p_philr

Interactive 3D plot

plot_ly(x = pco_philr[,"Axis.1"], y = pco_philr[,"Axis.2"], z = pco_philr[,"Axis.3"], 
        type = "scatter3d", mode = "markers", color = metadata$SampleType, 
        colors = brewer.pal(n = 12, name = "Paired")[c(2,1,8,7)]) %>%
        layout(scene = list(xaxis = list(title = labs_philr[1]),
                            yaxis = list(title = labs_philr[2]),
                            zaxis = list(title = labs_philr[3])
                        ))

Assemble plots

# Get legend
legend <- get_legend(p_jac)

# Reduce point size
p_jac$layers[[3]]$aes_params$size <- 3 
p_uwuf$layers[[3]]$aes_params$size <- 3
p_aitchison$layers[[3]]$aes_params$size <- 3
p_philr$layers[[3]]$aes_params$size <- 3

# Assemble plots
ps <- plot_grid(
  p_jac + theme(legend.position = "none", plot.title = element_blank()), 
  p_uwuf + theme(legend.position = "none", plot.title = element_blank()),
  p_aitchison + theme(legend.position = "none", plot.title = element_blank()), 
  p_philr + theme(legend.position = "none", plot.title = element_blank()),
  ncol = 2, labels = "AUTO", align = 'vh')

# Add legend to the assembled plot
plot_grid(ps, legend, rel_widths = c(6, 1))

# Export the plot
ggsave(here("result", "figures", "Figure 4.tiff"), width = 10, height = 6,
       units = "in", dpi = 300, compression = "lzw")

PERMANVOA

We’ll need to run two-way PERMANOVA with 2 nested random effects. Unfortunately, we have not found solutions on how to do this using the adonis() function from the vegan package. We’ll export the distance matrices and run PERMANOVA using the PERMANOVA+ add-on in PRIMER v7.

Export distance matrix

# Make a list of distance matrices
dist <- list(dist_jac, dist_uwuf, dist_aitchison, dist_philr)
names(dist) <- c("dist_jac", "dist_uwuf", "dist_aitchison", "dist_philr")

# Export distance matrices
lapply(
  seq_along(dist), 
  function(x) 
  {
  # Export PHILR distance directly
  if(grepl("philr", names(dist)[x])){
    write.table(as.data.frame(as.matrix(dist[[x]])), 
                here("data", "permanova", paste0(names(dist)[x], ".tsv")), 
                sep = "\t", col.names = NA, row.names = TRUE)
  } else {
  # For the other distance matrices: extract distance matrix firt, then export  
    write.table(as.data.frame(as.matrix(dist[[x]]$data)), 
                here("data", "permanova", paste0(names(dist)[x], ".tsv")), 
                sep = "\t", col.names = NA, row.names = TRUE)} 
  }
)

Import PERMANOVA results

Import and merge results produced by the PERMANOVA.

# Import results on main effects and interaction effect
permanova_main <- read_csv(here("data", "permanova", "main_effects.csv"))

# Import results on conditional contrasts
permanova_contrasts <- read_csv(here("data", "permanova", "conditional_contrasts.csv"))

# Apply multiple testing correction for conditional contrasts
permanova_contrasts <- column_to_rownames(permanova_contrasts, "Distance matrix") %>%
  t() %>%
  apply(2, function(x) p.adjust(x, method = "fdr")) %>%
  apply(2, function(x) round(x, digits = 3)) %>%
  t() %>%
  as.data.frame() %>%
  rownames_to_column("Distance matrix")

# Merge results
permanova <- full_join(permanova_main, permanova_contrasts, by = "Distance matrix")

# Format results as a gt table 
permanova <- gt(permanova) %>%
  tab_header(title = "PERMANOVA") %>%
  tab_footnote(footnote = md("Monte Carlo *p* value"),
               locations = cells_data(columns = c(2, 5), rows = 2)) %>%
  tab_footnote(footnote = md("Monte Carlo *p* value"),
               locations = cells_data(columns = c(7, 8), rows = 3)) %>%
  tab_spanner(label = "Main effects", columns = 2:3) %>%
  tab_spanner(label = "Conditional contrasts", columns = 5:ncol(permanova)) %>%
  cols_align(align = "center", columns = 2:ncol(permanova))

permanova
PERMANOVA
Distance matrix Main effects Interaction Conditional contrasts
Diet Sample origin REF-DIC VS. IM-DIC REF-DIM VS. IM-DIM REF-DIC VS. REF-DIM IM-DIC VS. IM-DIM
Jaccard 0.001 0.001 0.001 0.001 0.001 0.001 0.001
Unweighted UniFrac 0.0011 0.001 0.001 0.0011 0.001 0.001 0.001
Aitchison 0.001 0.003 0.004 0.002 0.004 0.0041 0.0021
PHILR (Euclidean) 0.001 0.001 0.001 0.001 0.005 0.001 0.001

1 Monte Carlo p value

PERMDISP

Since the PERMANOVA is testing differences in both location and dispersion effects, it’s important to test the homogeneity of multivariate dispersions following a significant PERMANVOA result. Results from the PERMANOVA suggested little evidence of NetPen effect for all the distance mitrices measured. Hence, we’ll use individual fish for the testing of homogeneity of multivariate dispersions. As there’s a significant interaction between the diet and sample origin effect for all the distance matrices used, we’ll assess homogeneity of multivariate dispersions for each main effect stratified by the levels of the remaining main effect.

The homogeneity of multivariate dispersions can be assessed visually (PCoA plot/boxplot) or via a statistical test called PERMDISP, which is implemented in R by the betadisper() function from the vegan package.

Jaccard distance

PERMDISP

# Remove the sample, which was excluded during the rarefying of ASV table, from metadata 
metadata1 <- rownames_to_column(metadata, "SampleID") %>%
  filter(SampleID %in% rownames(as.matrix(dist_jac$data))) %>%
  column_to_rownames("SampleID") 

# PERMDISP
disp_jac <- betadisper(dist_jac$data, metadata1$SampleType, type = "median")

# Permutaion test
permdisp_jac <- permutest(disp_jac, pairwise = TRUE, permutations = 999)
permdisp_jac
## 
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 999
## 
## Response: Distances
##           Df  Sum Sq  Mean Sq      F N.Perm Pr(>F)    
## Groups     3 0.13922 0.046406 15.536    999  0.001 ***
## Residuals 66 0.19714 0.002987                         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Pairwise comparisons:
## (Observed p-value below diagonal, permuted p-value above diagonal)
##            REF-DID    REF-DIM     IM-DID IM-DIM
## REF-DID            1.1000e-02 1.0000e-03  0.260
## REF-DIM 2.0987e-02            5.0000e-03  0.103
## IM-DID  3.2562e-08 4.8476e-03             0.001
## IM-DIM  2.6967e-01 1.0469e-01 1.8989e-07

Visual inspection

Boxplot showing the distances to centroid for each group

bp_jac <- data.frame(dist = disp_jac$distances, group = disp_jac$group) %>%
  ggplot(aes(x = group,y = dist)) +
    geom_boxplot(fill = "lightblue") +
    labs(x = "Sample type", y = "Distance to centroid", title = "Jaccard distance") +
    theme_minimal(base_size = 14) +
    theme(plot.title = element_text(hjust = 0.5))

bp_jac

Unweighted UniFrac distance

PERMDISP

# PERMDISP
disp_uwuf <- betadisper(dist_uwuf$data, metadata1$SampleType, type = "median")

# Permutaion test
permdisp_uwuf <- permutest(disp_uwuf, pairwise = TRUE, permutations = 999)
permdisp_uwuf
## 
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 999
## 
## Response: Distances
##           Df   Sum Sq   Mean Sq      F N.Perm Pr(>F)   
## Groups     3 0.071232 0.0237442 5.0821    999  0.006 **
## Residuals 66 0.308358 0.0046721                        
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Pairwise comparisons:
## (Observed p-value below diagonal, permuted p-value above diagonal)
##            REF-DID    REF-DIM     IM-DID IM-DIM
## REF-DID            0.11300000 0.00200000  0.081
## REF-DIM 0.10444956            0.13300000  0.662
## IM-DID  0.00033246 0.12196884             0.006
## IM-DIM  0.08062296 0.66033198 0.00659646

Visual inspection

Boxplot showing the distances to centroid for each group

bp_uwuf <- data.frame(dist = disp_uwuf$distances, group = disp_uwuf$group) %>%
  ggplot(aes(x = group,y = dist)) +
    geom_boxplot(fill = "lightblue") +
    labs(x = "Sample type", y = "Distance to centroid", title = "Unweighted UniFrac distance") +
    theme_minimal(base_size = 14) +
    theme(plot.title = element_text(hjust = 0.5))

bp_uwuf

Aitchison distance

PERMDISP

# PERMDISP
disp_aitchison <- betadisper(dist_aitchison$data, metadata$SampleType, type = "median")

# Permutaion test
permdisp_aitchison <- permutest(disp_aitchison, pairwise = TRUE, permutations = 999)
permdisp_aitchison
## 
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 999
## 
## Response: Distances
##           Df  Sum Sq Mean Sq      F N.Perm Pr(>F)  
## Groups     3  2.4937 0.83125 3.6099    999  0.017 *
## Residuals 68 15.6583 0.23027                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Pairwise comparisons:
## (Observed p-value below diagonal, permuted p-value above diagonal)
##           REF-DID   REF-DIM    IM-DID IM-DIM
## REF-DID           0.0210000 0.4530000  0.769
## REF-DIM 0.0237878           0.0040000  0.023
## IM-DID  0.4557318 0.0025285            0.277
## IM-DIM  0.7649855 0.0135925 0.2502807

Visual inspection

Boxplot showing the distances to centroid for each group

bp_aitchison <- data.frame(dist = disp_aitchison$distances, group = disp_aitchison$group) %>%
  ggplot(aes(x = group,y = dist)) +
    geom_boxplot(fill = "lightblue") +
    labs(x = "Sample type", y = "Distance to centroid", title = "Aitchison distance") +
    theme_minimal(base_size = 14) +
    theme(plot.title = element_text(hjust = 0.5))

bp_aitchison

PHILR transformed euclidean distance

PERMDISP

# PERMDISP
disp_philr <- betadisper(dist_philr, metadata$SampleType, type = "median")

# Permutaion test
permdisp_philr <- permutest(disp_philr, pairwise = TRUE, permutations = 999)
permdisp_philr
## 
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 999
## 
## Response: Distances
##           Df  Sum Sq Mean Sq      F N.Perm Pr(>F)
## Groups     3  158.44  52.813 1.5659    999  0.205
## Residuals 68 2293.45  33.727                     
## 
## Pairwise comparisons:
## (Observed p-value below diagonal, permuted p-value above diagonal)
##          REF-DID  REF-DIM   IM-DID IM-DIM
## REF-DID          0.091000 0.120000  0.535
## REF-DIM 0.086919          0.973000  0.209
## IM-DID  0.107439 0.959094           0.266
## IM-DIM  0.517724 0.218881 0.259210

Visual inspection

Boxplot showing the distances to centroid for each group

bp_philr <- data.frame(dist = disp_philr$distances, group = disp_philr$group) %>%
  ggplot(aes(x = group,y = dist)) +
    geom_boxplot(fill = "lightblue") +
    labs(x = "Sample type", y = "Distance to centroid", 
         title = "PHILR transformed Euclidean distance") +
    theme_minimal(base_size = 14) +
    theme(plot.title = element_text(hjust = 0.5))

bp_philr

Overview of PERMDISP

Table

Here we gather and format PERMDISP results.

# Gather PERMDISP results
permdisp <- list(Jaccard = permdisp_jac, "Unweighted UniFrac" = permdisp_uwuf, 
                 Aitchison = permdisp_aitchison, "PHILR (Euclidean)" = permdisp_philr) %>%
  map_df(~.x$pairwise$permuted) %>%
  mutate(contrasts = names(permdisp_jac$pairwise$permuted)) %>%
  filter(!contrasts %in% c("REF-DID-IM-DIM", "REF-DIM-IM-DID")) %>%
  arrange(match(contrasts, c("REF-DID-IM-DID", "REF-DIM-IM-DIM", "REF-DID-REF-DIM", "IM-DID-IM-DIM"))) %>%
  column_to_rownames("contrasts") %>%
  # multiple testing correction
  apply(2, function(x) p.adjust(x, method = "fdr")) %>%
  # format digits
  apply(2, function(x) round(x, digits = 3)) %>%
  t() %>%
  as.data.frame() %>%
  rename_all(~gsub("DID-", "DID VS. ", .x)) %>%
  rename_all(~gsub("DIM-", "DIM VS. ", .x)) %>%
  rownames_to_column("Distance matrix")

# Format results as a gt table   
permdisp <- gt(permdisp) %>%
  tab_header(title = "PERMDISP") %>%
  tab_spanner(label = "Conditional contrasts", columns = 2:ncol(permdisp)) %>%
  cols_align(align = "center", columns = 2:ncol(permdisp))

permdisp
PERMDISP
Distance matrix Conditional contrasts
REF-DID VS. IM-DID REF-DIM VS. IM-DIM REF-DID VS. REF-DIM IM-DID VS. IM-DIM
Jaccard 0.002 0.103 0.015 0.002
Unweighted UniFrac 0.008 0.662 0.151 0.012
Aitchison 0.453 0.046 0.046 0.369
PHILR (Euclidean) 0.240 0.266 0.240 0.266

Boxplot

Assemble boxplots

bps <- plot_grid(bp_jac, bp_uwuf, bp_aitchison, bp_philr, ncol = 2, labels = "AUTO", align = 'vh')
bps

Acknowledgements

The PhILR codes are based on the package viggette written by Justin D Silverman.

Session information

sessionInfo()
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Debian GNU/Linux 9 (stretch)
## 
## Matrix products: default
## BLAS/LAPACK: /usr/lib/libopenblasp-r0.2.19.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
##  [6] LC_MESSAGES=C              LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] gt_0.1.0           vegan_2.5-6        lattice_0.20-38    permute_0.9-5      ape_5.4-1          philr_1.12.0       phyloseq_1.30.0   
##  [8] plotly_4.9.2       cowplot_1.0.0      RColorBrewer_1.1-2 forcats_0.5.0      stringr_1.4.0      dplyr_0.8.5        purrr_0.3.3       
## [15] readr_1.3.1        tidyr_1.0.2        tibble_2.1.3       ggplot2_3.3.0      tidyverse_1.3.0    here_0.1           knitr_1.28        
## 
## loaded via a namespace (and not attached):
##  [1] colorspace_1.4-1    ggtree_2.0.4        rprojroot_1.3-2     XVector_0.26.0      fs_1.3.2            rstudioapi_0.11     farver_2.0.3       
##  [8] fansi_0.4.1         lubridate_1.7.4     xml2_1.2.5          codetools_0.2-16    splines_3.6.0       ade4_1.7-15         jsonlite_1.6.1     
## [15] broom_0.5.5         cluster_2.0.8       dbplyr_1.4.2        BiocManager_1.30.4  compiler_3.6.0      httr_1.4.1          rvcheck_0.1.8      
## [22] backports_1.1.5     assertthat_0.2.1    Matrix_1.2-17       lazyeval_0.2.2      cli_2.0.2           htmltools_0.4.0     tools_3.6.0        
## [29] igraph_1.2.4.1      gtable_0.3.0        glue_1.3.2          reshape2_1.4.3      fastmatch_1.1-0     Rcpp_1.0.3          Biobase_2.46.0     
## [36] cellranger_1.1.0    vctrs_0.2.4         Biostrings_2.54.0   multtest_2.42.0     nlme_3.1-139        iterators_1.0.10    crosstalk_1.1.0.1  
## [43] xfun_0.12           rvest_0.3.5         lifecycle_0.2.0     phangorn_2.5.5      zlibbioc_1.32.0     MASS_7.3-51.4       scales_1.1.0       
## [50] hms_0.5.3           parallel_3.6.0      biomformat_1.14.0   rhdf5_2.30.1        yaml_2.2.1          sass_0.1.1          stringi_1.4.6      
## [57] S4Vectors_0.24.4    foreach_1.4.4       checkmate_2.0.0     tidytree_0.3.3      BiocGenerics_0.32.0 commonmark_1.7      rlang_0.4.5        
## [64] pkgconfig_2.0.3     evaluate_0.14       Rhdf5lib_1.8.0      treeio_1.10.0       htmlwidgets_1.5.1   labeling_0.3        tidyselect_1.0.0   
## [71] plyr_1.8.6          magrittr_1.5        R6_2.4.1            IRanges_2.20.2      generics_0.0.2      DBI_1.1.0           pillar_1.4.3       
## [78] haven_2.2.0         withr_2.1.2         mgcv_1.8-28         survival_2.44-1.1   modelr_0.1.6        crayon_1.3.4        rmarkdown_2.1      
## [85] grid_3.6.0          readxl_1.3.1        data.table_1.12.8   reprex_0.3.0        digest_0.6.25       stats4_3.6.0        munsell_0.5.0      
## [92] viridisLite_0.3.0   quadprog_1.5-8